-
Notifications
You must be signed in to change notification settings - Fork 149
netdev CI testing #6666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kuba-moo
wants to merge
964
commits into
kernel-patches:bpf-next_base
Choose a base branch
from
linux-netdev:to-test
base: bpf-next_base
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
netdev CI testing #6666
+41,237
−12,728
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4f22ee0 to
8a9a8e0
Compare
64c403f to
8da1f58
Compare
78ebb17 to
9325308
Compare
c8c7b2f to
a71aae6
Compare
9325308 to
7940ae1
Compare
d8feb00 to
b16a6b9
Compare
7940ae1 to
8f1ff3c
Compare
4164329 to
c5cecb3
Compare
Qdisc use shinfo->gso_segs for their pkts stats in bstats_update(), but this field needs to be initialized for SKB_GSO_DODGY users. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: NipaLocal <nipa@local>
…it() qdisc_pkt_len_init() is currently initalizing qdisc_skb_cb(skb)->pkt_len Add qdisc_skb_cb(skb)->pkt_segs initialization and rename this function to qdisc_pkt_len_segs_init(). Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Avoid up to two cache line misses in qdisc dequeue() to fetch skb_shinfo(skb)->gso_segs/gso_size while qdisc spinlock is held. This gives a 5 % improvement in a TX intensive workload. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Use new qdisc_pkt_segs() to avoid a cache line miss in cake_enqueue() for non GSO packets. cake_overhead() does not have to recompute it. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: NipaLocal <nipa@local>
It is possible to reorg Qdisc to avoid always dirtying 2 cache lines in
fast path by reducing this to a single dirtied cache line.
In current layout, we change only four/six fields in the first cache line:
- q.spinlock
- q.qlen
- bstats.bytes
- bstats.packets
- some Qdisc also change q.next/q.prev
In the second cache line we change in the fast path:
- running
- state
- qstats.backlog
/* --- cacheline 2 boundary (128 bytes) --- */
struct sk_buff_head gso_skb __attribute__((__aligned__(64))); /* 0x80 0x18 */
struct qdisc_skb_head q; /* 0x98 0x18 */
struct gnet_stats_basic_sync bstats __attribute__((__aligned__(16))); /* 0xb0 0x10 */
/* --- cacheline 3 boundary (192 bytes) --- */
struct gnet_stats_queue qstats; /* 0xc0 0x14 */
bool running; /* 0xd4 0x1 */
/* XXX 3 bytes hole, try to pack */
unsigned long state; /* 0xd8 0x8 */
struct Qdisc * next_sched; /* 0xe0 0x8 */
struct sk_buff_head skb_bad_txq; /* 0xe8 0x18 */
/* --- cacheline 4 boundary (256 bytes) --- */
Reorganize things to have a first cache line mostly read,
then a mostly written one.
This gives a ~3% increase of performance under tx stress.
Note that there is an additional hols because @QStats now spans over a third cache line.
/* --- cacheline 2 boundary (128 bytes) --- */
__u8 __cacheline_group_begin__Qdisc_read_mostly[0] __attribute__((__aligned__(64))); /* 0x80 0 */
struct sk_buff_head gso_skb; /* 0x80 0x18 */
struct Qdisc * next_sched; /* 0x98 0x8 */
struct sk_buff_head skb_bad_txq; /* 0xa0 0x18 */
__u8 __cacheline_group_end__Qdisc_read_mostly[0]; /* 0xb8 0 */
/* XXX 8 bytes hole, try to pack */
/* --- cacheline 3 boundary (192 bytes) --- */
__u8 __cacheline_group_begin__Qdisc_write[0] __attribute__((__aligned__(64))); /* 0xc0 0 */
struct qdisc_skb_head q; /* 0xc0 0x18 */
unsigned long state; /* 0xd8 0x8 */
struct gnet_stats_basic_sync bstats __attribute__((__aligned__(16))); /* 0xe0 0x10 */
bool running; /* 0xf0 0x1 */
/* XXX 3 bytes hole, try to pack */
struct gnet_stats_queue qstats; /* 0xf4 0x14 */
/* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
__u8 __cacheline_group_end__Qdisc_write[0]; /* 0x108 0 */
/* XXX 56 bytes hole, try to pack */
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Group together changes to qdisc fields to reduce chances of false sharing if another cpu attempts to acquire the qdisc spinlock. qdisc_qstats_backlog_dec(sch, skb); sch->q.qlen--; qdisc_bstats_update(sch, skb); Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: NipaLocal <nipa@local>
prefetch the skb that we are likely to dequeue at the next dequeue(). Also call fq_dequeue_skb() a bit sooner in fq_dequeue(). This reduces the window between read of q.qlen and changes of fields in the cache line that could be dirtied by another cpu trying to queue a packet. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Most qdiscs need to read skb->priority at enqueue time(). __dev_xmit_skb() In commit 100dfa7 ("net: dev_queue_xmit() llist adoption") I added a prefetch(next), lets add another one for the second half of skb. Note that skb->priority and skb->hash share a common cache line, so this patch helps qdiscs needing both fields. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: NipaLocal <nipa@local>
q->limit is read locklessly, add a READ_ONCE(). Fixes: 100dfa7 ("net: dev_queue_xmit() llist adoption") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Implement .ndo_tx_timeout for MANA so any stalled TX queue can be detected
and a device-controlled port reset for all queues can be scheduled to a
ordered workqueue. The reset for all queues on stall detection is
recomended by hardware team.
The change introduces a single ordered workqueue
("mana_per_port_queue_reset_wq") with WQ_UNBOUND | WQ_MEM_RECLAIM and
queues exactly one work_struct per port onto it.
Reviewed-by: Pavan Chebbi <[email protected]>
Reviewed-by: Haiyang Zhang <[email protected]>
Signed-off-by: Dipayaan Roy <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
The AST2600 contains two dies, each with its own MAC, and these MACs require different delay configurations. Previously, these delay values were configured during the bootloader stage rather than in the driver. This change introduces the use of the standard properties defined in ethernet-controller.yaml to configure the delay values directly in the driver. Add the new property, "aspeed,rgmii-delay-ps", to specify per step of RGMII delay in different MACs. And for Aspeed platform, the total steps of RGMII delay configuraion is 32 steps, so the total delay is "apseed,rgmii-delay-ps' * 32. Default delay values are declared so that tx-internal-delay-ps and rx-internal-delay-ps become optional. If these properties are not present, the driver will use the default values instead. Add conditional schema constraints for Aspeed AST2600 MAC controllers: - For MAC0/1, aspeed,rgmii-delay-ps property is 45 ps - For MAC2/3, aspeed,rgmii-delay-ps property is 250 ps - Both require the "aspeed,scu" and "aspeed,rgmii-delay-ps" properties. Other compatible values remain unrestricted. Signed-off-by: Jacky Chou <[email protected]> Signed-off-by: NipaLocal <nipa@local>
The RGMII delay is configured in SCU region in Aspeed AST2600, therefore, add aspeed,scu property in dtsi for rgmii delay. And the RGMII delay value in each MAC is different. List below: MAC0 and MAC1 -> 45 ps MAC2 and MAC3 -> 250 ps Add "aspeed,rgmii-delay-ps" property for each MAC to specify the corresponding delay value. Signed-off-by: Jacky Chou <[email protected]> Signed-off-by: NipaLocal <nipa@local>
This change sets the rx-internal-delay-ps and tx-internal-delay-ps properties to control the RGMII signal delay. The phy-mode for MAC0–MAC3 is updated to "rgmii-id" to enable TX/RX internal delay on the PHY and disable the corresponding delay on the MAC. Signed-off-by: Jacky Chou <[email protected]> Signed-off-by: NipaLocal <nipa@local>
On the AST2600 platform, the RGMII delay is controlled via the SCU registers. The delay chain configuration differs between MAC0/1 and MAC2/3, even though all four MACs use a 32-stage delay chain. +------+----------+-----------+-------------+-------------+ | |Delay Unit|Delay Stage|TX Edge Stage|RX Edge Stage| +------+----------+-----------+-------------+-------------+ |MAC0/1| 45 ps| 32 | 0 | 0 | +------+----------+-----------+-------------+-------------+ |MAC2/3| 250 ps| 32 | 0 | 26 | +------+----------+-----------+-------------+-------------+ For MAC2/3, the "no delay" condition starts from stage 26. Setting the RX delay stage to 26 means that no additional RX delay is applied. Here lists the RX delay setting of MAC2/3 below. 26 -> 0 ns, 27 -> 0.25 ns, ... , 31 -> 1.25 ns, 0 -> 1.5 ns, 1 -> 1.75 ns, ... , 25 -> 7.75 ns Therefore, we calculate the delay stage from the rx-internal-delay-ps of MAC2/3 to add 26. If the stage is equel to or bigger than 32, the delay stage will be mask 0x1f to get the correct setting. The delay chain is like a ring for configuration. Example for the rx-internal-delay-ps of MAC2/3 is 2000 ps, we will get the delay stage is 2. Strating to this patch, driver will remind the legacy dts to update the "phy-mode" to "rgmii-id, and add the corresponding rgmii delay with "rx-internal-delay-id" and "tx-internal-delay-id". If lack these properties, driver will configure the default rgmii delay, that means driver will disable the TX and RX delay in MAC side. Signed-off-by: Jacky Chou <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Alex will send phylink patches soon which will make us link up on QEMU again, but for now let's hack up the link. Gives us a chance to add another QEMU NIC test to "HW" runners in the CI. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Let's see if this increases stability of timing-related results.. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
These are unlikely to matter for CI testing and they slow things down. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
tc_actions.sh keeps hanging the forwarding tests. sdf@: tdc & tdc-dbg started intermittenly failing around Sep 25th Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: NipaLocal <nipa@local>
We exclusively use headless VMs today, don't waste time compiling sound and GPU drivers. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
kmemleak auto scan could be a source of latency for the tests. We run a full scan after the tests manually, we don't need the autoscan thread to be enabled. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reusable PR for hooking netdev CI to BPF testing.